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Abstract 


Single nucleotide polymorphisms (SNPs) in CEBPA gene have been found to 
be associated with cancer especially Acute Myeloid Leukemia (AML). 
Therefore, the identification of functional and structural polymorphisms in 
CEBPA is important to study and discover therapeutics targets and potential 
malfunctioning. For this purpose, several bioinformatics tools were used for 
the identification of disease-associated nsSNPs, which might be vital forthe 
structure and function of CEBPA, making them extremely important. /n 
silico tools used in this study included SIFT, PROVEAN, PolyPhen2, SNP&GO 
and PhD-SNP, followed by ConSurf and |-Mutant. Protein 3D modelling was 
carried out using I-TASSER and MODELLER v9.22, while GeneMANIA and 
string were used forthe prediction of gene-gene interaction inthis regard. 
From our study, we found that the L345P, R333C, R339Q, V328G, R327W, 
L317Q, N292S, E284A, R1i56W, Y108N and F82L mutations were the most 
crucial SNPs. Additionally, the gene-gene interaction showed the genes 
having correlation with CEBPA’s co-expressions and importance in several 
pathways. In future, these 11 mutations should be investigated while 
studying diseases related to CEBPA, especially for AML. Being the first of its 
kind, future perspectives are proposed in this study, which will help in 
precision medicine. Animal models are of great significance in finding out 
CEBPA effects in disease. 
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Fig. 1. Graphical Abstract 


1. INTRODUCTION 


In human’s genome, the common genetic variations are SNPs, which are widely used in association-studies 
with quantitative traits and complex diseases’. Along human genome, SNPs occur in every 100-300 bases, 
which represents 90% of genetic variations. They are found in human genome in different densities in both 
coding and non-coding regions®. However, SNPs are abundant in the non-coding regions of human genome, 
including untranslated and regulatory regions as well as introns. Additionally, the phenotypicfunctions can 
also be affected by single nucleotide variation, contributing towards the development of disease. SNPs can 
influence transcription factor binding or gene expression, while transcriptional activity may be modified by 
SNPs of UTR regions’, ribosomal translation of mRNA and RNA stability®. 


Humans CEBPA gene encodes CCAAT/enhancer-binding protein alpha’. It is involved in blood cells 
differentiation as a transcription factor’®. Alteration of specific genes including CBP complex can lead to cell 
differentiation arrest in AML", bZIP transcription factor protein is encoded by intron less gene, which can 
bind to certain gene enhancers and promoters as a homodimer. It can form heterodimer with CEBP-gamma 
and CEBP-beta, also c-Jun as distinct transcription factors. However, CEBPA is required for development of 
abnormal AML and for normal mature granulocyte formation because it is essential for myeloid lineage 
commitment’?. 


Various studies have reported that 50% of genetic disorders are due to mutated nsSNPs*?. Genetic 
variations in respect of deleterious effect has been reported ina study in ABCA1 gene, which could possibly 
lead towards the development of hypoalphalipoproteinemia disease. Additionally, inSTEAP2 gene, nsSNPs 
can cause prostate cancer by upregulating the mentioned gene, identified in similar studies**. 


We have analyzed CEBPA gene nsSNPs to find out the most deleterious ones, in order to highlight its 
potential role in AML. CEBPA can interact with Cyclin-dependent kinase 4 and Cyclin-dependent kinase 2 $>. 
However, pediatricand adult patient of AML are linked to good outcome, shown mutation of CEBPA gene”. 
Genetic abnormalities, hematopoietic progenitors, characterize AML including blocking of hematopoiesis 
granulocytes and blasts excessive proliferation. It has been shown that differentiation of myeloid 
progenitors stops by blocking of CCAAT/enhancer-binding protein alpha and suppression of CEBPA 
expression, during granulocyte differentiation role of CCAAT/enhancer-binding protein alohaand as tumor 
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suppressor gene role of CEBPA is important in AML prognosis’’. Therefore, structural and functional 
variants of CEBPA needed to be sort out and studied. Hence, in this study several in silico tools were used 
for finding nsSNPs, which could possibly cause damage to CEBPA protein. We proposed 3D model of 
possible deleterious nsSNPs protein of CEBPA and its wild type. This study covers its protein analysis using 
in silico which can be very helpful in future studies of disease treatment associated with CEBPA, caused by 
nsSNPs. 


2. METHODOLOGY 


identification of 


l 'rotein stability 


Recruiting nsSNPs 


UOI}EAIJBSUOD 
Aseuolznjona ula}01g 


GeneMANIA 
& 
String 


vu 
=5 
O 
et 
o, 
= 
S 
° 
Q. 
o. 
5 

aa 


PTMs prediction 





Fig. 2. The graphical illustration of methodology 


This research work (Fig. 2) has been completed in several steps including web servers, tools and databases. 
However, GRCh38 has been used as reference human genome. The detailed description of methodology is 
as follow. 


2.1. Recruiting nsSNPs 


National Centre for Biotechnology Information (NCBI) dbSNP (Accessed: 17 July 2020) was used in 
recruitment of all the SNPs of CEBPA. Missense SNPs were selected from CEBPA gene window in NCBI 
where gene view was Selected. 


2.2. Identification of deleterious nsSNPs 


The effect of nsSNPs on protein was identified using four bioinformatical tools. These tools were SIFT 
(Sorting Intolerant From Tolerant) (https://sift.bii.a-star.edu.s¢/www/SIFT seq submit2.html), PROVEAN 
(Protein Variation Effect Analyzer) (http://provean.jcvi.org/seq submit.php), SNP&GO 
(https://snps.biofold.org/snps-and-go/snps-and-go.html) and PhD-SNP (Predictor of human Deleterious 
SNP) (https://snps.biofold.org/phd-snp/phd-snp.html). Allthose SNPs were selected which predicted to be 
deleterious or intolerant. Further screening of selected SNPs were carried out using PolyPhen2 
(Polymorphism Phenotyping 2) (http://genetics.bwh.harvard.edu/pph2/). 
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2.3. Stability analysis of protein 


I-Mutant 2.0 was used for the scrutiny of target protein stability upon substitution due to nsSNP 
(https://folding.biofold.org/i-mutant/). Change in mutated protein stability is predicted using this web 
server, based support vector machine. It also provides predictions with R1 (Reliability index) which ranges 
from0O to 10, 0 and 10 shows lowest and highest reliability respectively. CEBPA protein fasta sequence was 
submitted to find the possible effect of deleterious nsSNPs on protein of CEBPA, the condition was set to 
7.0 pH and 25°C temperature (default parameters). 


2.4. Prediction of protein evolutionary conservation 


Evolutionary conservation in protein sequence was predicted using ConSurf (https://consurf.tau.ac.il/), 
which analysed the bases of phylogenetic relations between homologous sequences?®. For this purpose, to 
predict conservation degree of amino acid residue, total 50 homologous sequences were used. 
Additionally, the residues which were highly conserved and aligned with deleterious nsSNPs were analyzed 
further. After the completion of SNPs relevant information, the next step is to perform structural modelling 
of relevant protein. 


2.5. Prediction of 3D protein structure 


I-TASSER (Iterative Threading ASSEmbly Refinement) is a stratified approach used to predict function and 
structure of protein. I-TASSER (https://zhanglab.ccmb.med.umich.edu/I-TASSER/) was used to predict 3D 
protein model, which is 3D homology modelling tool. CEBPA wild type 3D proteins models were generated. 
Furthermore, all the mutant structures were generated by MODELLER v9.22 by using wild type CEBPA 
protein generated by I-TASSER as template. TM-align (https://zhanglab.ccmb.med.umich.edu/TM-align/) 
was used for the comparison of selected mutants and wild type CEBPA, which also predicted the RMSD 
(Root Mean Square Deviation). Those which had greater RMSD values as compare to CEBPA wild type were 
selected for further study’’. However, Interactive visualization and molecular features of the resultant 
protein structure was studied by Chimera v1.11. 


2.6. Prediction of PTM sites 


Protein function can be predicted by studying post transcriptional modification in protein. GPS-MSP v3.0 


(http://msp.biocuckoo.org/online.php) was used to predict methylation sites in CEBPA. At tyrosine, 


threonine and serine positions of CEBPA protein sequence, phosphorylation sites predictions were done 
NetPhos 3.1 (http://www.cbs.dtu.dk/services/NetPhos/). Neuralnetwork ensembles were used by NetPhos 
3.1, anda threshold of 0.5 was set. Those residues were predicted as phosphorylated which had high score 
than our selected threshold. In addition to that, UbPred (http://www.ubpred.org/) and BDM-PUB 
(http://bdmpub.biocuckoo.org/prediction.php) were used for prediction of ubiquitylation sites. Balanced 
cut-off was selected for UbPred?°. Lysine residues were predicted by UbPred which showed equalor higher 
score then threshold. 


2.7. Gene-Gene interaction of CEBPA 


GeneMANIA (http://genemania.org/ ) can be used hypothesis generation about function of gene, 
analyzation of gene lists and genes prioritization for functional assays. GeneMANIA was used to study the 
interaction of CEBPA gene. However, STRING (https://string-db.org/cgi) (Accessed: 17 August 2020 using 
manual search for CEBPA in search box) was used for the prediction of effect of nsSNPs of CEBPA on rest of 
related genes. It is also used for the observation of association with other genes’’. It predicts gene-gene 
interaction based on pathways, co-expression, protein domain similarity, genetics, co-localization and 
protein interaction. 
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3. RESULTS 


Understanding of SNPs functions will help us to understand the phenotypic variation of human genetics, 
especially of complex human diseases. However, functional SNPs identification from pool containing both 
neutral and functional SNPs can lead towards the development of disease by truncated protein formation. 


3.1. Recruited nsSNPs 


World’s largest database for variations of nucleotide is dbSNP which house data from Genome wide 
association studies (GWAS). Therefore, around 1616 SNPs were recruited from it, in which 269 were 
nonsynonymous SNPs, 369 in 3’UTR while 190 were located in 5’UTR, 152 were coding synonymous and 
540 were others, represented in figure 2. However, from further analysis of SNPs, seven resulted in stop 
codons having direct effect on protein structure. In addition to that, truncated protein is also the result of 
SNPs that directly lead towards the diseases. 
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20.00% 12.53% 
15.00% 10.02% 
10.00% 
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Fig. 3. Percentage of different types of SNPs in Human CEBPA gene. 


3.2. Deleterious nsSNPs identification 


All recruited nsSNPs were subjected to different four bioinformatical tools which includes PhD-SNP, SNPGO, 
and PROVEAN and SIFT, for further predictions of its effect on structure and function of CEBPA protein. 


Threshold value of -2.5 was setin PROVEAN and all those variants were considered deleterious which were 
below this threshold. 61 nsSNPs had deleterious effect according to results of PROVEAN. 


SIFT shows us predicted values which in turn shows that whether substitution of an amino acid affects 
function of protein based on sequence homology and also amino acids physical properties. Therefore, SIFT 
is selected for this purpose to sort out our results further. In SIFT, 0.5 Tl (Tolerance Index) was considered 
as threshold value and below this value all the results were considered intolerant or effected. 


95 SNPs were predicted by SIFT which were intolerant by SIFT and 41 nsSNPs were found diseased 
predicted by PhD-SNP. SNPs&GO is a server forthe prediction of single point protein mutations likely to be 
involved in the insurgence of diseases in humans. 21 SNPs were labelled as diseased by SNPs and GO result. 
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Fig. 3 shows results of all tools. We selected 20 nsSNPs, which were predicted as deleterious in all of four 
tools and provided in table 1. These nsSNPs were then submitted to PolyPhen2, which predicted results as 
possibly damaging and benign. The most confident predictions were considered as probably damaging. It 
also gives O to 1 count score. 15 nsSNPs (16 amino acid residual changes), which were predicted as 
probably damaging were considered for further analysis. 


Deleterious nsSNPs prediction by 
different tools 


PROVEAN 


dNS-Pdd 





SIFT 
Fig. 4. Representation of predicted deleterious nsSNPs by four in silico tools 


Table 1. Most damaging 6 nsSNPs predicted by tools 


provean | Polyphen2 SNPs & GO SIFT 
(Hum Div) 
Amino acid Score Score Prob. Prob. Prediction Score 
change 


‘sise7eiis7 | casy | 020 | osel |1| osaa | o |os21| Affected | 000 
Crs7e17s2002 |  13asp | 6390 | 0999 |6 |0785 | 3 |0637 | Affected | 0o01 
st30aaas7s9 | Raoa | 3899 | 0999 |o 0493| 1 | oase | Affected | 000- 


Crs36aga2687 | 33a | a899 | 100 |s | (0765 | e | 0793| Affected | 000 
Crs758728582 | Raae | 7aoo | 100 |2 | (0577 | o | 0493| Affected | 000 
‘si422i3es76 | vaze | 6s23 | 100 |3 | (0627 | 2 | 0392| Affected | 000 
‘sigossieait |  R32rw | Gass | 100 |1| (0586 | 2 |0383] Affected | 000- 
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rs1392203731 K298Q -3.899 1.00 2 0.584 2 0.386 Affected 0.00 
rs776590829 N292S -4.907 1.00 5 0.767 5 0.751 Affected 0.00 
rs1196766447 E284A -5.588 1.00 1 0.574 4 0.311 Affected 0.00 
rs1379379731 L220P -2.663 0.331 2 0.586 1 0.465 Affected 0.00 
rs1267025311 R156W -2.983 1.00 6 0.798 4 0.722 Affected 0.00 
rs1257791760 C133Y -2.887 0.480 6 0.819 4 0.716 Affected 0.01 
rs1197023470 C133R -3.334 0.012 6 0.806 5 0.739 Tolerated 0.10 
rs1038352346 G132C -3.287 0.998 5 0.736 2 0.582 Affected 0.02 
rs1245991358 Y108N -2.900 0.998 1 0.525 1 0.569 Tolerated 0.31 
rs917977456 F82L -3.057 0.990 3 0.649 4 0.280 Affected 0.00 
rs1452063514 D63N -2.992 1.00 4 0.697 4 0.717 Affected 0.00 


3.3. Prediction of CEBPA stability 


Stability of protein is predicted by l-Mutant of CEBPA gene for selected nsSNPs and substitution of its amino 
acid. 15 nsSNPs were submitted to l-Mutant and predicted that 14 of these decrease protein stabilities and 
one increase stability, All the nsSNPs were individually submitted which were selected and its result of 
stability was obtained to be increased/decreased with RI ranging from O to 10, given in table 2. 


Substitution of G132C showed increase in the stability while rest of all showed decrease in stability. This 
result showed us that all these 14 nsSNPs might cause a greater damage by decreasing stability of CEBPA 
protein. In further analysis, G132C was skipped. 


Table 2. l-Mutant prediction for stability of CEBPA protein upon selected mutations. 














Amino acid aE : i im 

Stability RI Amino acid change Stability RI 
change 
L345P Decrease 6 K298Q Decrease 1 
R339Q Decrease 6 N292S Decrease 6 
E334K Decrease 8 E284A Decrease 3 
E334Q Decrease 6 R156W Decrease 8 
R333C Decrease 6 G132C Increase 0 
V328G Decrease 7 Y108N Decrease 0 
R327W Decrease 6 F82L Decrease 5 
L317Q Decrease 8 D63N Decrease 4 


3.4. CEBPA protein evolutionary conservation 


It is necessary to know about the evolution to study the mutations which leads to health problems in 
humans72. To know about possible effects of the selected nsSNPs, ConSurf was used for conservation 
profile study of CEBPA amino acid residues. Results obtained provided us structural representation of 
CEBPA protein. Results of all the residue amino acid of CEBPA were given but our prime interest was in the 
location of identified nsSNPs. According to ConSurf L345, L317, L220, C133 and G132 were predicted to be 
buried, C357, R339, E334, R333, R327, K298, N292, E284, R156 and D63 were Exposed and functionally 
important while V328 and F82 were Buried and structurally important. Selected nsSNPs conservation score 
is given in Table 3. Results showed that nsSNP, which were located at highly conserved regions, were most 
damaging to CEBPA protein structure and function. 
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Table 3. Conservation profile for the selected residues predicted by ConSurf. 














Amino acid change | Conservation score Prediction 

C357 9 | Exposed and functionally important 
L345 8 Buried 

R339 9 Exposed and functionally important 
E334 9 Exposed and functionally important 
R333 9 Exposed and functionally important 
V328 9 Buried and Structurally important 
R327 9 Exposed and functionally important 
L317 7 Buried 

K298 9 Exposed and functionally important 
N292 9 Exposed and functionally important 
E284 9 Exposed and functionally important 
L220 7 Buried 

R156 8 Exposed and functionally important 
C133 4 Buried 
G132 7 Buried 

Y108 2 Exposed 

F82 9 Buried and Structurally important 
D63 9 Exposed and functionally important 


F 3.5. 3D modelling of CEBPA and its mutants 


For generation of 3D structure of wild type, we used I-TASSER. It used 2nbiA and 5jcsS templates for 3D 
modelling. It generated five 3D structures forthe wild type CEBPA protein in which structure having lowest 
C-score (1.43) was selected. The selected mutant structures were generated by MODELLER v9.22 using wild 
type protein generated by I-TASSER as template. Foreach mutant structure, RMSD values were calculated. 
RMSD value shows average distance between a-carbon backbones of mutant and wild type models. Higher 
RMSD values predict greater deviation between mutant and wild type structure. R339Q had the highest 
RMSD value of 3.74 A while L298Q had the lowest RMSD value of 0.43 A. Mutant structures with RMSD 
value greaterthan 2.0A is considered effected. 11 mutant structures were having greater RMSD values than 
2.0A. Details of all the structures are provided in table 4. Four mutant structures (L298Q 0.43 A RMSD), 
E334K (0.49 A RMSD), D63N (1.41 A RMSD) and E334Q (1.47 A RMSD) were skipped for final modelling 
because their RMSD values were less than 2.0A. Wild type structure and final selected mutant residues are 
presented in Figure 5 and6 respectively. 





Fig. 5. 3D structure of wild type CEBPA protein modelled by I-TASSER 
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A284 E284 
Ea 
C333 R333 
D63 N63 
R327 W327 R339 
SRL B 
W156 R156 


Fig. 6. 3D structures of the modelled mutants of CEBPA protein modelled by MODELLER v9.22. 


Table 4. RMSD values for selected mutant CEBPA proteins compared with CEBPA wild protein. 


Residual Change {RMSD sis Residual Change 
L345P 3.54 Å K298Q 0.43 Å 
 R339Q | TA N292S 3.38 A 


0.49 A E284A 3.20A 


 E334Q — —  |147À R156W 3.49 Å 
R333C 3.55A Y108N 3.53A 
V328G 3.41 A F82L 3.38A 


R327W 3.60A D63N 1.41 Å 
L317Q 3.58 Å 


3.6. Predicted PTMs (Post Translational Modification) 





Protein function can be predicted through an extensive study on post-transcriptional modification (PTM) in 
protein. GPS-MSP 3.0 was used for methylation prediction, which predicted no sites in CEBPA to be 
methylated. It means that nsSNPs might have no role in affecting methylation site. NetPhos3.1was used for 
prediction of possible phosphorylation sites. It predicted 28 residues to have potential phosphorylation 
sites in which 16 were serine specific, 6 were threonine specific and 6 were tyrosine specific. Detailed 
results are provided in table 5. For prediction of potential ubiquitylation sites, we used BDM-PUB and 
UbPred. Of the 15 total lysine residues, BDM-PUB predicted 6 sites to be ubiquitinated while UbPred 


predicted 12 sites to be ubiquitinated. Details of the results of these two tools are given in table 6. 


Table 5. Prediction of Phosphorylation Sites by NetPhos 3.1 in CEBPA protein 





0.538 CKI 
0.563 


0.748 cdk5, p38MAPK, GSK3 
0.508 
0.516 CKII, cdc2 
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266 0.655 
CKII 

Threonine (T) 
0.567 DNAPK 


0.875 
Tyrosine (y) 0.520 INSR 
0.826 


0.981 unsp 


0.522 





Table 6. CEBPA Ubiquitination Prediction Results by UbPred and BDM-PUB 
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3.7. Gene-gene interaction of CEBPA 


GeneMANIA and STRING wereto predict interaction of CEBPA gene with other genes. GeneMANIA results 
showed that CEBPA has physical interaction with (AFP, TGFB1, UHRF1, TOP2A, TK1, TRIM26, NCOA3, PREB, 
EBF1, ADH7, RUNX1T1, SLC2A4, MMP11, ONECUT1, TNF, DEFA3, UBP1, FDPS, LYZ, PCBP2).GeneMANIA and 
STRING predictions of gene interactions are given in figure 5 and 6. 
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Fig. 7. Prediction of Gene MANIA for gene-gene interaction of Human CEBPA gene with other genes. 
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Fig. 8. Prediction of STRING for possible interaction of Human CEBPA gene with others. 
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4. DISCUSSION 


In AML, homozygous CEBPA mutation is described in several studies. In three cases of AML, homozygous 
CEBPA mutation involve CEBPA locus with mitotic recombination”. Ina recent study, mutations were found 
out in CEBPA gene, we used different approaches to find out nsSNPs, mentioned in methodology™. CEBPA 
loss of function mutation remained in association with deletion-9q within a noncomplex karyotype in 
AML”. Many studies have reported different types of SNPs, which have significance association with 
different types of cancers. As for AML, this is the first study of its type which we conducted in detail to 
highlight the disease associated SNPs for AML by investigating CEBPA gene. Largest database for SNPs is 
dbSNP, 1616 SNPs were recruited from it, which consisted of 269 nonsynonymous SNPs, 190 located in 
5’UTR, 365 in 3’UTR, 152 coding synonymous and 540 others. The nsSNPs were further analysed in which 7 
SNPs resulted in stop codon which means it has direct effect on structure of protein and can possibly lead 
to disease. Of the 269 nsSNPs, not enough data were available for 31 SNPs and were not included in our 
analysis. 


Four bioinformatics tools were used to know the effect of nsSNPs on the function and structure of CEBPA 
protein. In all these four in silico tools PhHD-SNP predicted 17.23% SNP&GO 8.82%, SIFT 39.92% and 
PROVEAN 25.63% to be deleterious or intolerant. 20 nsSNPs were predicted as deleterious by all the four 
tools, which were then submitted to PolyPhen2. PolyPhen2 predicted results as benign, possibly damaging 
and probably damaging, 15 nsSNPs were probably damaging was considered the most confident 
predictions with score of approximately 1 on the scale of O to 1 count score and were considered for 
further analysis. 15 nsSNPs were submitted to |-Mutant, which is used to predict protein stability of CEBPA 
gene for the selected nsSNPs and substitution of its amino acid, it predicted that 14 of these nsSNPs 
decrease protein stability and while substitution of G132C showed increase in the stability. It showed us 
that all these 14 nsSNPs might cause greater damage by decreasing CEBPA protein stability while G132C 
was skipped as it increases the stability. After protein modelling, we crosschecked our protein stability 
predictions in CUPSAT server, which predicts protein stability upon mutation based on protein structure. 
We found that our I-Mutant predictions were 81.25% in agreement with CUPSAT predictions, which shows 
that our predictive results are more reliable. To predict conservation profile of CEBPA protein, ConSurf was 
used, which uses combination of evolutionary conservation data and prediction of solvent accessibility. All 
those residues, which are highly conserved, are predicted to be functionally or structurally important based 
on their position on protein surface and core?°. In protein-protein interaction vital amino acids are involved, 
they are supposed to be more conserved. All nsSNPs, which are present at conserved regions, are most 
damaging”. ConSurf showed us possible effects of nsSNPs in CEBPA profile, which showed us CEBPA 
protein structural representation. Location of identified nsSNPs were our priority although all results were 
given of amino acid residue of CEBPA. According to ConSurf L345, L317, L220, C133 and G132 were 
predicted to be buried, C357, R339, E334, R333, R327, K298, N292, E284, R156 and D63 were Exposed and 
functionally important while V328 and F82 were Buried and structurally important. Our results showed 
that, those nsSNPs were most damaging to CEBPA protein function and structure, which were located at 
highly conserved region. 


For prediction of Post Translational Modifications (PTMs) sites, several different tools were used. No sites 
of methylation in CEBPA were predicted using GPS-MSP 3.0, which means that nsSNPs might have no role 
in affecting methylation site. NetPhos3.1 predicted 28 residues to have potential phosphorylation sites in 
which 16 were serine specific, 6 were threonine specificand 6 were tyrosine specific. BBM-PUB and UbPred 
were used for prediction of ubiquitination sites. Of the 15 total lysine residues, UbPred predicted 12 sites to 
be ubiquitinated while BDM-PUB predicted 6sites to be ubiquitinated. 


For prediction of gene-gene interactions, we used STRING and GeneMANIA. From STRING predictions, 
results showed combined score for each of the genes and found PPARG, NCOA1, NCOA3, CREBBP, JUN, FOS, 
CREB1, ATF3, MAPK8, MAPKS, JUNB, RELA, TP53, RUNX1, EP300, PPARGC1A, ADIPOQ, FABP4, LEP and KLF5. 
It is evident from GeneMANIA that CEBPA has physical interaction with AFP, TGFB1, UHRF1, TOP2A, TK1, 
TRIM26, NCOA3, PREB, EBF1, ADHZ, RUNX1T1, SLC2A4, MMP11, ONECUT1, TNF, DEFA3, UBP1, FDPS, LYZ 
and PCBP2. 
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Our study provides all the analysis and information in detail, which is needed for damaging nsSNPs 
identification. There are certain limitations in every study and hence in our too. Our study is based on web 
servers and computer tools which are mainly based on statistical and mathematical algorithms. Therefore, 
experimental investigation is necessary to confirm these results. Our study provides an insight about 3D 
protein structure of CEBPA protein, its nsSNPs, its gene-gene interaction and potential PTM sites which 
might be helpfulin future studies of CEBPA in orderto understand its role in AMLand all related diseases as 
well. 


5. CONCLUSIONS 


Our study concluded 11 nsSNPs to be the most deleterious ones. These nsSNPs included L345P, R339Q, 
R333C, V328G, R327W, L317Q, N292S, E284A, R156W, Y108N and F82L. These 11 nsSNPs are considered to 
be very important and can have active role in diseases associated with CEBPA gene, in AML. Our study also 
concluded that the CEBPA gene is correlated with other genes in many pathways. Any change in the 
function or structure of CEBPA protein will ultimately affect other pathways, thereby depicting its 
importance. These nsSNPs are significant for therapeutic targets and personalized medicines as well., and 
thyese SNPs can be used as diagnostic markers for AML. Although our study was in complete detail, there is 
still need of confirmation of our predicted results using in vitro strategies like modelling in mouse models. 
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